
%0 Conference Proceedings
%T Classification of Life Events on Social Media
%D 2016
%A Cavalin, Paulo,
%A Dornelas, Fillipe,
%A Cruz, Sergio,
%@affiliation IBM Research
%@affiliation IBM Research, Universidade Federal Rural do Rio de Janeiro
%@affiliation Universidade Federal Rural do Rio de Janeiro
%E Aliaga, Daniel G.,
%E Davis, Larry S.,
%E Farias, Ricardo C.,
%E Fernandes, Leandro A. F.,
%E Gibson, Stuart J.,
%E Giraldi, Gilson A.,
%E Gois, João Paulo,
%E Maciel, Anderson,
%E Menotti, David,
%E Miranda, Paulo A. V.,
%E Musse, Soraia,
%E Namikawa, Laercio,
%E Pamplona, Mauricio,
%E Papa, João Paulo,
%E Santos, Jefersson dos,
%E Schwartz, William Robson,
%E Thomaz, Carlos E.,
%B Conference on Graphics, Patterns and Images, 29 (SIBGRAPI)
%C São José dos Campos, SP, Brazil
%8 4-7 Oct. 2016
%I Sociedade Brasileira de Computação
%J Porto Alegre
%S Proceedings
%K Social Media, Life Events, Classification, Umbalanced datasets.
%X In this paper we present an investigation of life event classification on social media networks. Detecting personal mentions about life events, such as travel, birthday, wedding, etc, presents an interesting opportunity to anticipate the offer of products or services, as well to enhance the demographics of a given target population. Nevertheless, life event classification can be seen as an unbalanced classification problem, where the set of posts that actually mention a life event is significantly smaller than those that do not. For this reason, the main goal of this paper is to investigate different types of classifiers, on a experimental protocol based on datasets containing various types of life events in both Portuguese and English languages, and the benefits of over-sampling techniques to improve the accuracy of these classifiers on these sets. The results demonstrate that a Logistic Regression may be a poor choice to deal with the original datasets, but after over-sampling the training set, such classifier is able to outperform by a significant margin other classifiers such as Naive Bayes and Nearest Neighbours, which do not benefit as well from the over-sampled training set in most cases.
%@language en
%3 SibgrapiWIA_LifeEvents_2016_cameraready.pdf
